Matching with don't-cares and a small number of mismatches

نویسندگان

  • Chaim Linhart
  • Ron Shamir
چکیده

In matching with don’t-cares and k mismatches we are given a pattern of length m and a text of length n, both of which may contain don’t-cares (a symbol that matches all symbols), and the goal is to find all locations in the text that match the pattern with at most k mismatches, where k is a parameter. We present new algorithms that solve this problem using a combination of convolutions and a dynamic programming procedure. We give randomized and deterministic solutions that run in time O(nk2 log m) and O(nk3 log m), respectively, and are faster than the most efficient extant methods for small values of k. Our deterministic algorithm is the first to obtain an O(poly(k) · n log m) running time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On pattern matching with k mismatches and few don't cares

We consider the problem of pattern matching with k mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern P of length m and a text T of length n, we want to find all occurrences of P in T that have no more than k mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorith...

متن کامل

A Filtering Algorithm for k -Mismatch with Don't Cares

We present a filtering based algorithm for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols in either p or t (but not both), and a bound k, our algorithm finds all the places that the pattern matches the text with at most k mismatches. The algorithm is deterministic and runs in Θ(nmk logm) time.

متن کامل

Pattern matching with don't cares and few errors

We present solutions for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give an Θ (n(k + logm log k) log n) time randomised algorithm which finds the correct answer with high probability....

متن کامل

k -Mismatch with Don't Cares

We give the first non-trivial algorithms for the k-mismatch pattern matching problem with don’t cares. Given a text t of length n and a pattern p of length m with don’t care symbols and a bound k, our algorithms find all the places that the pattern matches the text with at most k mismatches. We first give an O(n(k + log n log log n) logm) time randomised solution which finds the correct answer ...

متن کامل

Approximate String Matching with Variable Length Don ' t Care

Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 109  شماره 

صفحات  -

تاریخ انتشار 2009